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Abstract 

In the first phase of our work, we have concentrated on laying the foun- 
dation to develop fast algorithms. We have developed some basic codes 
for three-dimensional scattering and studied several candidate algorithms 
for speeding up the codes. These algorithms include using recursive struc- 
ture like the recursive aggregate interaction matrix algorithm (RAIMA), the 
nested equivalence principle algorithm (NEPAL), the ray-propagation fast 
multipole algorithm (RPFMA), and the multi-level fast nmltipolo algorithm 
(MLFMA). We have also investigated the use of curvilinear pal dies to build 
a basic method of moments code where these acceleration techniques c an be 
used later. 

In the second phase of our work, we have concentrated on implement ing 
three-dimensional NEPAL on a massively parallel machine, the* Connection 
Machine CM-5. We have been able obtain some 3D scattering result on the 
Connection Machine, CM-5. In order to understand the parallelization of 
codes on the Connection Machine, we have also st udied the parallelization of 
3D finite-difference time-domain (FDTD) code 1 with PML material absorb- 
ing boundary condition (ABC). We found that simple algorithms like the 
FDTD with material ABC can be parallelized very well allowing us to solve 
over a-million-node problem under one minute. In addition to the above, we 
have studied the use of the fast multipole method and the ray-propagation 
fast multipole algorithm to expedite matrix- vector multiply in a conjugate- 
gradient solution to integral equation of scattering. We find that these meth- 
ods are faster than LL1 decomposition for one incident angle, but are slower 
than LU decomposition when many incident angles an' needed as in Hip 
monostat ic RCS calculations. 
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CHAPTER 1 


NESTED EQUIVALENCE PRINCIPAL ALGORIHTM (NEPAL) 

IN THREE DIMENSIONS 


1. Introduction 

The computation of electromagnetic scattering of three dimensional ob- 
jects finds applications in many areas. Hence, it has been earnestly studied 
by many engineers, scientists and mathematicians alike. As a result, many 
algorithms have been developed for solving 3D scattering problems. Among 
these algorithms, finite difference time domain methods are popular due to 
their lower complexity and ease of implementations. However, many of these 
methods are for one excitation only, and the solution process needs to be 
repeated for multiple excitations. Moreover, absorbing boundary conditions 
are required for those algorithms. On the other hand, the solution to inte- 
gral equations automatically satisfies the radiation condition. As a result, an 
integral equation solver with computational complexity comparable to dif- 
ferential equation solvers is a viable alternative solution technique to these 
scattering problems. 

This paper presents an integral equation solver using the nested equiv- 
alence principle algorithm (NEPAL) which has been successfully applied to 
2D problems [1]. It is known that in solving an integral equation, one can 
first replace the volume scatterer by small subscatterers where the size of a 
subscatterer is much smaller than a wavelength. The unknown function to 
be sought is expanded in terms of basis functions which usually have their 
supports on the subscatterers. By matching the field on the subscatterers, a 
set of linear equations are formed. The number of unknowns is proportional 
to the number of subscatterers in this case. Physically, each subscatterer can 
be considered a scattering center. 

The interaction of a subscatterer with the other subscatterers can be 
described by interaction matrices [9]. If there are N subscatterers, then there 

will be N 2 interaction matrices since each subscatterer will interact with all 
the other subscatterers including itself. The N 2 interaction matrices can be 
found with N 3 operations [8,9]. The idea of NEPAL is to reduce the number 
of scattering centers, and hence to reduce the CPU time required for the 
solution. 


2 



Similar algorithms for inversion of a matrix using nested dissection method 
for the finite element method can be found in [10]. It is shown in [1] that the 
computational complexity of NEPAL is asymptotically N 1-5 for 2D problems. 
In this paper, we will first formulate NEPAL for three dimensional problems 
and show that the complexity in 3D is N 2 . In addition, we will use some of 
the symbols differently. For example, i is used either as a summation index 
or as the imaginary number y/—l. The meaning is clear from the context. 

2. Formulation of the Problem 

A three dimensional volume can be discretized into a set of small cubic 
boxes called subscatterers. Each subscatterer has volume A 3 . The scattering 
property of each subscatterer can be represented by a multipole with order 
proportional to the electrical size of the subscatterer. The scattered field can 
be written as [2], 

E( r ) = ^ {M nm (r - Ti) [bf ] nm + N nrn (r — i\) [b { ] jim } 

nm 

= V>‘(r - n) b,, (1) 

where, b; = [bf*,bf] e is the multipole coefficients, superscript M stands 

for TE component, and E for TM component. Here, i//( r ) = [M,N] is the 
vector spherical wave function, and M and N are given by [2,4] 

M nm (r) = 9^-z n (kr)Y nm (9,<f>) - 4>z n (kr ) — , 
sin0 o9 

N nm( r ) =f n (» + 1 ) 2n (fcr)y nm (g, 4>) + 9^ r [rz„(fcr)] 

A i 777 f ) 

+ <j> r ■ Q- [rz„(fcr)] Y nm {9, 4>). 
kr sind or 

n = 1,2,-*- , m = —n, •••,«. 

In the above, z n {kr ) is a spherical Bessel function or spherical Hankel function 
of the first kind depending on whether i/> £ (r) is an incoming wave or an 
outgoing wave, respectively. When Bessel function is used, means the 

regular part of ip, which replaces t/>. In the above, Y nm is a spherical harmonic, 
and it is defined by [4] 

= (~in / 2n , + 1 !’ 1 T i! lT P " l ( cos ^) ei ''‘' > ’ m 2 °> 

y 47r (n + ?nj! 
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M-1) W Kh(M), m<0, 

where, P” 1 is the associated Legendre function. 

Expressing the incident field in the same manner, 

E‘ nc (r) = 9fyV ,t ( r — ri) • -a,, (2) 

we can relate the unknown coefficient b, to a s via a T-matrix by matching 
the field on the boundary of the subscatterer [2,6,7]: 

b; = T • a IS • a,. (3) 

In the above, is a translation matrix [2,5], and a 5 is the incident wave 
amplitude. 

When A is small, the T-matrix is diagonal and close to that of a sphere, 
and can be found using the Mie series [2], When Nji subscatterers are present, 
the coefficients b; will be unknown which satisfy the following brute force 
equations [11,12]: 

Na 

b, = Ti • c*ii • a, + o.ij • hj , i = 1,2, ••• ,N A . (4) 

i= i 

The solution to the above equations can be written as 

Na _ 

b; — ^ ' It_7 ( ) ' &js ' i — 1> 2, • • • , Na' (5) 

}= 1 

Hence, the scattered field is given by 

N a _ Na _ 

E'“(r) = £ V>‘(r - r.) • • a„ ■ a,. (6) 

1 = 1 j = 1 

In the above, 3 — 1, 2, • • • , N A are called interaction matrices; they 

describe the interactions between N A subscatterers. They can be found in 
a variety of ways, for example, by Gaussian elimination, or by a recursive 
algorithm [8,9]. 

3. Equivalence of Interaction Matrices. 

As shown in the above section, the scattered field can be determined by 
the N\ interaction matrices for a group of N A subscatterers. In this section, 
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Definition : Two sets of interaction matrices I ij(N A ) and 
I ij{N B ) are said be equivalent if they generate the same 
scattered field via Equation (6) for the same incident field. 


we will show that there is another set of interaction matrices which will 
generate the same scattered field. To this end, we first give a definition of 
equivalent interaction matrices. 

In the following, we will give a mathematical description for the equiv- 
alence of the interior subscatterer by those on the surface using Huygens’ 
equivalence principle [3]. 

First, we assume that S is the boundary of a volume region V, and V 
contains sources which generate field E and H on the boundary. Next, we let 
r' be a point outside V , as shown in Figure 1. Huygens’ equivalence principle 
states that the field at r' due to sources inside V can be represented via 
equivalent sources on the boundary: 


E(r') 



x E(r) • V x G e (r, r') + iujjih x H(r) • G e (r, r')] , (7) 


where, 


G e (r,r') = 



e ‘ fc K- r 'l 
47r|r — r'| 


(7a) 


is the free space dyadic Green’s function. Equation (7) has a double roles. 
First of all, it provides an indirect approach to compute the field at points 
outside V . Secondly, it tells us how to construct the equivalent sources: the 
equivalent sources are simply the tangential components of the fields. 

Apart from the above two points, we also observed that: (1) The equiv- 
alence principle does not specify the type and the number of the original 
sources inside V ; they can be induced sources, or even some other equivalent 
sources. Also, there could be only one point source, more than one point 
source or distributed sources. (2) The equivalent sources are uniquely de- 
termined by the shape of the surface and the density of the tangential field 
components. This gives us the possibility of replacing a large number of 
sources with a relatively small number of sources. 

The mathematical representation of a source can be different. For exam- 
ple, Equation (2) represents a direct source by its multipole amplitude, while 
Equation (6) represents the induced sources by a set of matrices Iq^), i,j = 
1,2, •• • , Na- Similarly, we can represent the equivalent sources in the same 
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manner, i.e., another set of interaction matrices I ij(N D ),hj = 1,2, ••• ,Na. 
We now construct the relation between the two sets of matrices using Equa- 
tion (7). 

First, we divide the surface S into N D small surfaces or patches, where 
each patch is of area AS. As a result, Equation (7) can be rewritten as 


E(r') = — V"' / dS [n x E(r) ■ V x G e (r, r') + iojfm x H(r) • G e (r, r')j . 

i=i 

( 8 ) 

At the z-th patch, we let r = r" + r*. Here, r" is the local coordinate for 
the z-th patch. Then, 


r = 


r." _1_ „ 

r + r* 


r = r 


(r - r,). 


(9) 


Under the condition that |r' — | > |r"|, the Green’s function G e (r,r') can 

be expanded as (r' = r' — r* , [3, p.409j) 


G e (r", r ;) = 




[%M n ,_ m (r")M nm (r') + 5R 5 N„,_ m (r")N„ m (r')] . 

( 10 ) 


n(n + 1) 

When this expansion is applied to Equation (8), we have 


E(r') = Y, f dS " x E ( r ) ’ ^N n ,_ m (r ,, )M„ m (r') 

^n(n + l); ASi L 

+n x E(r) • %M„ _ m (r")N nm (**,' ) 

+irjh x H(r)-%M n> _ m (r")M mn (r') 

+ir)h x H(r)-^N n ,_ m (r")N nm (r')], (11) 

where, 77 = \j Jto/To is the free space impedance. 

Equation (11) can be written more compactly as 

E(r') (M-W) ' l b ."U + N-«) • [b?]„„.} , (12) 


where, upon making approximations to the integrals, 

-ik 2 (~) m 


[W 


Ml 


n(n + 1) 


■AS 


hi x E(r { ) • 5R<?N n _ m (0) 


(13a) 
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(13b) 


j 1 +^ ( r T 

[bf ]„m = — 7 — rrv AS iTjfli x H ( r ») ' %n„ _ m (o) L 

n(n -f 1) L J 

where, «» is the average normal of the i-th patch. 

In deriving Equation (11), the identities 

V x M = JfcN, V x N = JbM 

have been used. Equation (12) gives the source-field relation for the equiva- 
lent sources on the surface S. 

We assume that the source-field relation for the interior sources is of the 
form as Equation (1) with multipole amplitude given by a.j, j = 1,2,..., AT/. 

Then, 

E(r) = E{ M «*( r - r,) • [a"]„„ + N„„(r - r,) • [af]„„}, (14a) 

H(r) = i ^{N„„(r - r,) • [af ]„„ + M„„(r - r,-) ■ [afl„„}. (14b) 

We let r = r, in Equations (14a) and (14b), and substitute them into 
Equation (13a), we have 

l b "u =^b^AS» !7 N n ,_ m (0) 

n{n + 1) 

• ^ ^ x 4- fii x N l/ p(rjj) • [ a j 

or 

[b"U = E {P""U"M • b" W + ' [afU • (15a) 

h u h 

Similarly, 

[bf]„„ = E {tS§"W» ■ !•?]«. + ■ [ofUl . (15b) 

where, r_j; = r; — r_, and 

_ 7 * t-2 / \rr» 

- .■ . -L i-A5^N»,_ m (0) • fq x (16a) 

J n(n +1) 

Ic^U.-* = ~f 2( ~ 1 ) “ AS!R g N„,-,„(0) • fn x N„„(r„), (16b) 

7 n [n + l ) 
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\c EM ] 

Lij 

= Wi E \ 


(16c) 

[c**l 

-- [c T. 


(lOd) 

1 




II 

\o 

II 

C iJ 

rEM 

L V 

1 

ej 
; lo 

1 


Denoting 


we have the relation between the equivalent sources and the original sources: 


b > = Y, h i? ' a i- 


where b, = [b 1 M ,bf] t and = [a^aj 7 ]'. If there is only one interior source 
located at r j inside V, then the field at r' corresponding to this source is 
given by (14a) with Na = 1, or 

E(r') = V>V - r_,) • aj. 

This is the direct source-field relation. On the other hand, we have an indirect 
source-field relation as Equation (12), or 


No _ _ 

E ( r ') = " r *) * b? ‘ a i‘ 

*=i 

Equating the two representations, we have for arbitrary aj, 

_ Nb _ _ 

V»‘(r- r i) = 2^ t (r-ri)-hS“. (17) 

«=i 

Equation (17) is the first equation which will be used in deriving the equiv- 
alence between interaction matrices of interior subscatterers and boundary 
subscatterers. 

Now we consider the reverse problem: the sources are outside of V and 
we need to compute the field at r' inside V due to the outside sources. 

Using the similar steps as in deriving h-°,\ we can write the field E(r') in 
the same form as Equations (12), (13a) and (13b), except that r' is insider V 
in this case. 

Suppose that the source is located at r s and the multipole amplitude a, 
for the source are known, then the fields at r due to this source are given by 
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(18a) 


E(r) = ^{M^(r - r.) • [af ]„„ + N„„(r - r 5 ) • [af]„„}, 

H(r) = 7" S{ N ^( r ~ r *) ' l a ^l^ + M ^*( r ~ r *) • [ a f]^*}- ( 18b ) 

Equation (18) gives an outgoing wave centered at r 3 , and the field point r 

is not specified yet. Now, we let r be a point in the vicinity of r* € S, with 

i = 1,2, ..., N D . Observe that at r, the field can also be considered incoming 
waves centered at r;. This is more rigorously given by the wave translation 
equation[4,5]: 

M(r - r 3 ) = %M(r - r<) • A ia + 3?<7N(r - r { ) • B is , 

N(r - r.) = 5%M(r - r<) • B ia + %N(r - r { ) ■ A is . 

Therefore, 

E(r) = E{ R 9 M *<‘( r “ r <> • I a "W + % N -/.( r - r i) • l a SW}' ( 18c > 

H(r) = f ]C{* 0 *Wr “ r ‘) ' + ^M„(r - r { ) • [a?]„ p }, (18d) 

lT ) vn 

where, a^ is related to a 3 by 



Substitution of (18c) and (18d) into Equation (13a), we have 

wi- = 

■fiiX^2(RgM uti (0) ■ [aff]^ + %N^(0) • , 

* 'V 


or 


KU = E {[<=""! — -/< ■ K?W + ['“U • [«£]„.} • (19a) 


Ufi 


Similarly, 


ibf> 


J nm 


EU c . £ 


EM I 


, . f a M l 4- \c EE ] . \a E ] \ 


(19b) 


vp 
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Hence, 


where, 


bj Cj ■ a 5 — 


-MM -ME 


r MM] 
. i 


\ r ME] 

S nm } vfi 


-zfc 2 (-) m 
n(n + 1) 

n(n + 1) 


■A5RpN ni _ m (0) • hi x 5RpM^(0), 
■AS3fyN n _ m (0) - hi x DfyN^O), 


In the above, 
and 


r c £A/j = C ME| 


\c EE ] = fc MM l 


»0M„„(O) = 0, 


%N^(0) = 0, i/^l, 


%n 1iT1 (0) 

%N 1>0 (0) 


2 / 3 . „ ... 

— i/ — (±x — *y), 
3V87r V 



2 . 


(19c) 

(20a) 

(20b) 

(20c) 

(20d) 


Now, assume that the source is located outside V at r 4 , is an interior 

point, the incident field in the vicinity of r ; due to source at r 3 can be written 
as 

E mc (r) = 9tgip l (r - r,) • a js • a s . (21) 

Applying the equivalence principle, this field can be thought of as coming 
from the equivalent sources on the boundary: 


n b _ 

E inc (r) = X>‘( r - ri )- b,-, 


1 = 1 


( 22 ) 


where b; is given by (19c). 

Using translation formula to translate i/^r — r;), the outgoing wave orig- 
inated at r { , to an incoming wave centered at r^, we have, 

t/>‘(r - r.) = r - rj) ■ a j{ . (23) 
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Substituting i//(r — r,) in (21) by (23), and equating the resultant equation 
and (22) for arbitrary point r near r^, we obtain: 


Nb N d 

^ ^ ^ bjj * OC; 5 ■ a S) (24) 

i = 1 »=1 

with 

hj? = «j, • Ci. (25) 

Equation (24) is the second equation which will be used for our equivalence 
problem. 

With Equations (17) and (24), we can easily relate the equivalent inter- 
action matrices l,j(N D ) with the original ones To this end, we start 

with Equation (6). We replace i/’ < ( r ~ r 0 of Equation (6) by Equation (17), 
and replace oc is • a 3 by Equation (24), to obtain 

Nn N A N a No 

E,C °( r ) = Y2 1 / ,f ( r _ r m) ' ^m! ’ hjm' ‘ «m' S • a 3 . (26) 

m = 1 i = l j~\ jn 1 = 1 

The above can be rewritten as 

n b n b 

E 3 ca (r) = ^^(r- r m ) ■ I m m-(/v u ) ■ a m >, • a,, (27) 

m=l m' = l 


where 

n a _ n a _ _ 

l mm'(N D) = E h -! • EWxl * h !w- ( 28 ) 

«=l i=i 

Equation (28) specifies the equivalence relation between the original interac- 

tion matrices and the equivalent ones. This is the key equation for NEPAL. 
We will explain how this equation is used to reduce the number of interaction 
matrices. 

For a volume scatterer, the total subscattcrers arc divided into boundary 

subscatterers and interior subscatterers. If we define hj°- = h^ = I for i = 7?i, 
then the right hand side of equation (28) may also include interaction matrices 
of boundary subscatterers. Let N/ t be the total number of subscatterers, Nb 
is the number of boundary subscatterers. Then the number of interaction 
matrices is reduced from N A to N \ since Nb < N A . Furthermore, there is no 
violation of addition theorem in using this equivalence principle. 
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4. The Nested Equivalence Principle Algorithm (NEPAL) 

As mentioned in the introduction of this paper, the idea of NEPAL is 
to reduce the number of scattering centers, or the number of interaction 
matrices, as shown by Equation (28). In this section, we will describe the 
steps of this algorithm . 

To begin, the subscatterers are first divided into different levels of groups 
in a nested manner, i.e. , a group of one level is divided into two subgroups 
of the next lower level. Each subgroup is again divided into two subsub- 
groups of one level lower than that of the subgroup, and so on. The process 
continues until the lowermost level of subgroups which contains 64 subscat- 
terers is reached. In each level, we find the scattering solution for each of 
the subgroups, and then use Huygens’ equivalence principle (Equation (28)) 
to replace the interior subscatterers of a subgroup by subscatterers on the 
boundary. For example, in the lowermost level, we first solve for the scat- 
tering solution for each subgroup of 64 subscatterers. Then Equation (28) 
is used to remove the interior subscatterers. In this level, there are only 
eight interior subscatterers for each subgroup. After this step, each subgroup 
contains 56 subscatterers with known solution. 

Next we go to the next higher level-the second level. In this level, each 
subgroup contains 112 subscatterers, as it is made up of two subgroups of 56 
subscatterers. The solution to the 112 subscatterers together is not known. 
Hence, we first solve for the scattering solution for each subgroup with 112 
subscatterers, and then their interior subscatterers (8 for each subgroup in 
this level) are removed via Equation (28) and 104 subscatterers remain in 
each subgroup with known solution. 

We can see that at each level, upon removal of the interior subscatterers, 
a subgroup contains less subscatterers than what it originally contained. The 
process is continued until the highest level is reached, where there is only one 
group which is made up of two lower level subgroups. Again, the solution 
must be sought for the group. However, since interior subscatterers for the 
two subgroups are removed, this group contains much less subscatterers than 
N a if N a is the total number of subscatterers for the problem. As a result, 
much less operations are required to find the solution. 

It is seen that the operation at each level contains two parts, one part 
is to find the scattering solution, the other part is to remove the interior 
subscatterers. It is important that the removal of the interior subscatterers 
of a group does not change the scattering property of the subgroup. The 
solution of interaction matrix can be found in the same way as described in 
[ 11 - 

Using similar analysis as presented for the 2D case, the computational 
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complexity can be shown to be CN 2 for N spheres, where C is roughly 900. 


5. Parallelization of the Code on CM-5 

There are two major processes in implementing NEPAL: they arc the 
interaction matrix algorithm (IMA) process and the equivalence principle 
process. The IMA consumes most of the CPU time. When NEPAL was 
first implemented on the Thinking Machine Corp., CM-5, the code calls an 
external function in the library of CM-5. This function inverts a matrix by 
Gauss-Jordan method (the name of the function is “gen_gj Jnvert”). When 
this function is called in the IMA part, we found that the CPU time for 
matrix fill is dominant, because the matrix inverse is greatly expedited. This 
was not the case on a serial machine. Hence, we seek to expedite the matrix 
fill as follows. 

The matrix involved in the IMA can be written as 

A= [A,] 


where A are block matrices. 


f * = A 

l -T fc(i)-a 0 -, i ± j, 


and Ti(q are the isolated T-matrix. c*ij are the wave translation matrices[2]. 
For this specific problem, the size of the subscatterrer is small compared to 
wavelength. As a result, the size of the translation matrices is 6 x 6, as it can 
be given explicitly as: 


A ij — 


ex 

P 


P 

oc 


where 


[cty] 12 = ^=/i 2 sin 6ij cos 6ije l4,ij 


[<*ij ]22 = /lo + /i 2 (l-5cos 2 dij - 
[dtyjn = -0.75 sin 2 6 l] h 2 e' 2 ' t ‘‘ ] 
[cXij]33 = h Q — -([ajj] 22 — /lo) 

\Pij]n = ^/iisin0 o -e^ 
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[/^i j ] 33 2^1 COS 0{j 

["O'] 11 = ["O’] 33 
["o]2i = ["o]i2e -,20,J 

["o]23 = ["ij]l2 

[" 0 ] 3 1 = ["o]i3e"‘ 4<Au 

["o]32 = ~["o]l2 

[£M n = -[^o] 33 

[Pijhx = 

[^O'l 23 = [^ol 12 

[P t j}32 = [Pijhl 

[Aj]l3 = ~[Pij}22 = [Pij] 31 = 0. 

In the above, h 0 , hi, and h 2 are the spherical Hankel functions with the 
appropriate arguments. With this explicit expression, it is easy to parallelize 
the matrix-fill part. We know that one of the strength of CM-5 is on matrix 
operations. Hence, instead of using “do loops” in standard FORTRAN to 
compute r i: ,, 6ij, fa, and h 0 (krij), etc., for each i, j = 1,2, . . . , N, we use the 
statement on the CM-5, 

for all (i=l:n, j=l:n) r (i , j ) =sqrt ( (x(i) -x( j ) ) **2 

+(y(i)-y(j))**2 

+(z(i)-z(j))**2) 

to compute r,j, and similarly for 6ij, (frij. Having obtained the above, we 
can compute the following efficiently by performing matrix operations. For 
example, the CM-5 FORTRAN code will appear as 

h_0=exp(i*k_0*r) / (k_0*r) 

h_l=h_0* (1/ (k_0*r) -1) 

h_2=3*h_l/ (k_0*r)-h_0 

beta( : , : ,3,3)=1.5*i*cos (theta) *h_l 

This paprallel implementation makes the matrix-fill part a small portion 
of total CPU time. A similar proceduce is used to parallelize the computation 

of and matrices. 

We estimate a throughput of about 1 GFLOPS on a G4 processor partition 
of the CM-5. 
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6. Numerical Results 

Using this algorithm, we have developed a program to compute the scat- 
tering solution of rectangular cubic dielectric scattcrers. In our numerical 
simulation, 6 spherical modes (3 for TE and 3 for TM) arc used to expand 
the scattered field of a subscattercr. Figure 2 and Figure 3 show the scattered 
field (magnitude and phase, respectively) of a cubic dielectric scattcrer with 
side length a = b = c = 1.6A and e r = 2.6. The incident wave is coming 
from 9 = 180° and <f> = 0° with 9 polarization and frequency of 300 MHz. 
The scattered field is observed at 9 = 0° to 9 = 180° and (f> — 0°. Only the 
Eg component is ploted. The results agree well with that of the brute force 
solution using Gaussian elimination. 

In Figure 4, we compute the RCS of a low observable target which is a 
foamy cylinder of diameter 1.2 A 0 , and length 2.1 Ao, with e r = 1.05 + t0.2. 
It took about 2,000 s of CPU on a 128 processor partition of the CM-5, 
and 2.66 GB of memory. The result compares reasonably well with a brute 
force method. The brute force method uses a Neumann type iteration, and 
converges quite quickly because of the low contrast of the scattcrer. For the 
computation of 31 points using 9 iterations, it took about 600 s of CPU on 
the CM-5 with 329 MB of memory. 

As for the computational complexity, Figure 5 shows the comparison of 
the CPU time (on CM-5) of this method (NEPAL) with the brute force 
solution using LU decomposition. It is seen that when N, the number of 
unknowns, is small, NEPAL is not as efficient as brute force. However, when 
N is large, NEPAL uses less CPU time than brute force. The cross over 
occurs at about N = 1800. It is also seen that the slope of the CPU time 
curve for NEPAL is decreasing, and approaches the slope of an N 2 curve. 

7. Conclusion 

We have presented in this paper the extension of the nested equivalence 
principle algorithm (NEPAL) to three dimensions. The algorithm is based on 
Huygens’ equivalence principle and nesting small algorithms within a larger 
one. Therefore, the key element is to divide the computation into several 
stages and reduce the number of unknowns at each stage. This represents 
an efficient algorithm for directly solving the integral equation of scattering 
with reduced computational complexity of 0(N 2 ). Hence, it can be used to 
compute the scattering solution of large objects for many incident waves. 
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Figure 1. The Huygens’ equivalence principle. 
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Figure 2. Scattered field (magnitude) of a cubic dielectric scatterer 
for plane wave incidence at 6 = 180°. Side length = 1.6A, e r = 2.6, 
/ = 300 (MHz). Field points are on the circle of r = 10A. The cubic 
is divided into 512 small subscatterers. 
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Figure 3. Scattered field (phase) of a dielectric cubic with the same 
parameters as in Figure 3. 
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Figure 4. Comparison of the RCS of a low-observable target with a 
brute-force iterative approach. 
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Figure 5. Comparison of CPU time of NEPAL and Gaussian elimi- 
nation. 
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CHAPTER 2 


PARALLELIZATION OF FDTD CODE ON CM-5 
USING PML MATERIAL ABSORBING BOUNDARY CONDITION 


1. Introduction 

The finite difference time domain method [1,2] is widely regarded as one 
of the most popular computational electromagnetics algorithms. Although 
FDTD is conceptually very simple and relatively easy to program, the method 
is actually quite efficient since it involves 0(N l b ) computational complexity 
in 2-D and 0(N 1 ' 33 ) computational complexity in 3-D [3]. In fact, FDTD 
can be considered an optimal algorithm since 0(N a ) numbers are produced 
in 0(N a ) operations. 

FDTD is also ideally suited for implementation on a single-instruction 
multiple-data (SIMD) massively parallel computer. The reason is that the 
stencil operations that must be computed at each node of the space grid in- 
volve only nearest-neighbor interactions and may be implemented at a mini- 
mum communication cost [4]. A major challenge, however, is in implementing 
absorbing boundary conditions (ABCs) at the edges of the FDTD grid. On 
scalar and vector computers, these boundary conditions are typically com- 
puted using methods such as the Engquist-Majda [5], Mur [6], Liao [7] or 
Higdon [8] ABC. However, these methods are not ideal for parallel super- 
computers since they all involve communication with many elements normal 
to the grid boundary. Such communication can easily surpass the time spent 
computing core FDTD operations in the grid interior, especially for higher- 
order boundary conditions, and hence can become a bottleneck in the FDTD 
code. Also, they do not allow for SIMD operation on a parallel machine 
without the use of masking. 

An alternate method of implementing an ABC is to use a conventional ab- 
sorbing material boundary [4, 9-14]. For SIMD parallel computation, these 
methods have the advantage that the ABC may be implemented with the 
same FDTD stencil operation as the interior nodes by modifying the conduc- 
tivity material parameter at the edge of the FDTD grid. The disadvantage is 
that the reflection coefficient at the absorbing border is zero only at normal 
incidence and is both angle and frequency dependent. Consequently, the ab- 
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sorbing material border region must be made quite large — typically 20-100 
grid points along each edge in order to minimize reflections. 

Recently, Berenger [15] suggested a more general method of implementing 
an absorbing material boundary condition. Berenger proposed a procedure 
for 2-D wave propagation whereby Maxwell’s equations are generalized and 
added degrees of freedom are introduced. The added degrees of freedom allow 
the specification of absorbing borders with zero reflection coefficient at all 
angles of incidence and all frequencies. Moreover, the generalized Maxwell’s 
equations reduce to the familiar Maxwell’s equations as a special case and 
hence the same generalized equations can be used to propagate fields in both 
the interior region and absorbing region. Although the interface between 
the interior region and the absorbing boundary is reflectionless, there is still 
a reflection from the edge of the grid. The advantage of using Berenger’s 
procedure is that much larger conductivity values may be specified in the 
absorbing region, leading to a drastic reduction in the number of grid points 
required for the absorbing boundary. 

In the present paper, a formulation similar to the Berenger idea is derived 
for 3-D wave propagation from first principles using a coordinate stretching 
approach. The advantage of the new method for SIMD parallel computation 
is stressed. The method is validated with 3-D FDTD numerical computations 
on a Thinking Machines Corporation Connection Machine CM-5. 

2. Modified Maxwell’s Equations 

For a general medium, we define the modified Maxwell’s equations in the 


frequency domain, 

assuming e lu>t time dependence, as 




V e x E = iupLil 

(1) 



V h x H = —iuc E 

(2) 



V/, • eE = p 

(3) 



V e • /xH = 0 

(4) 

where 

V e = 

-Id .13 .Id 

: x ~~tt + y~"^r + 2— o“ 

e x ox e y oy e z oz 

(5) 


v h = 

-Id .Id .Id 

X h x dx + ^ h y dy h z dz 

(6) 
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In the above, Ci, hi, i = x, y , z are coordinate-stretching variables that stretch 
the x,y,z coordinates for V e and V/,. It shall be shown later that when Ci 
and hi are complex numbers, the medium can be lossy. Note that (3) and (4) 
are derivable from (1) and (2). A general plane wave solution to Equations 
(1) - (4) has the form 


E = E 0 e' k r 

and 

(7) 

H = H 0 e ,kr 

(8) 

where k = xk x + yk y + zk z . Substituting Equations (7) and (8) into Equations 
(1) and (2) above gives 

k e xE = ujy , H 

(9) 

k/j x H = — (jjeEi, 

(10) 

where k c = x^ + ii^ + z^ and k/, = xj*- + yr L + if L - Combining the above, 

Cjb Cy C z fljj . fly flj U 

we have 

—u 2 ue H = k e x k/, x H 

(11) 

= k,,(k e • H) - (ke • k h )H. 

But from Equation (9), k e • H = 0 for a homogeneous medium, 
the dispersion relation 

This gives 

u 2 fie = k e • k,, 
or 

(12) 

r 2 1 1.2 , 1 jl2 , 1 1.2 

e x h x 1 e y h y y e z h z 

(13) 

where k 2 = w 2 pe. Equation (13) is the equation of an ellipsoid in 
satisfied by 

3-D and is 

k x = k\/ e x h x sin 8 cos 0, 

(14) 

k y = K\/eyhySmdsm(j), 

and 

(15) 

k z — e z h z cos 8. 

(16) 


Note that when e*, /q, i = x,y,z are complex, the wave in the x, y, and z 
directions are attenuative and can be independently controlled. Under the 
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matching condition, e x = h xt e y = h y) and e z = h z , we have |k c | 2 = |k/,| 2 — k 2 . 
The wave impedance is then given by 


E[ _ W _ //f 

| H | we \ e' 


(17) 


irrespective of the values for ei,i = x,y,z and the direction of propagation. 


3. Single Interface Problem 

Assume that a plane wave is obliquely incident on the interface 2 = 0 
in Figure 1. Furthermore, we may assume that the plane wave is of arbi- 
trary polarization. The incident field may be decomposed into a sum of two 
components, one with electric field transverse to z (TE‘) and the other with 
magnetic field transverse to z (TM Z ). We will examine these two components 
individually. 

In the (TE Z ) case, we let the incident field in region 1 be given as 

E ; = E 0 e ,k ‘ r . (18) 

In the above, k,„ • E 0 = 0, and E 0 is in the xy plane. Similarly, we define 
the reflected field in region 1 as 

E r = R te E 0r e' kr r (19) 

and the transmitted field in region 2 as 

E t = T te E ot e ik, r . (20) 

Phase matching requires that k ix = k rx = k tx and ki y = k ry = k ly . Hence, 
we can define E 0r = E 0 t = E 0 since they all point in the same direction. 
Applying the boundary condition that the tangential electric field must be 
continuous across the plane 2 = 0,* we have 

1 + R te = T te . (21) 


The magnetic field may be determined using Equation (9) for regions 1 
and 2 as 

Hi = k ‘ e X E ° - e ,k, r + R te kre X E ° - e‘ kr r 

LJfl I 


( 22 ) 


* This boundary condition follows from the modified Maxwell’s equation (1). 
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and 


(23) 


II, = T te k “ X E ° c' k " 

UJH2 

where k; e = -f 7 / + z^ 1 and similarly for k rc and k tc . We also define 

ku = fc» 2 , ^22 = k tz and note that k rz = — k\ z . Then equating the tangential 
components of Equations (22) and (23), we have 


ki z e 2z fj, 2 [l - R TE ] — T TE k 2z e iz fi l . 
Combining Equations (21) and (24), we have 

R TE 


(24) 


and 


}TE k\ z C 2z [J , 2 k 2 z Ci z f.li 
ki z e 2 z fi 2 + k 2 z ei z fii 

j-,te 2 k\ z e 2z fi 2 


kue 2 z fi 2 + k 2 z e\ z fi\ 

Applying a similar procedure to the TM 2 component, we have 


(25) 


(26) 



j^tm k\ z h 2z e 2 — k 2z h\ z ti 

(27) 


k\Jl2z€2 + ^2zh\ z ei 

and 

rp TM _ 2/Ci z /l2z^2 

(28) 


k\ z h 2z t 2 + k 2z hi z ei 

4. A Perfectly Matched Interface 


The phase matching condition requires that k\ x = k 2x and k iy 

= & 2y , or 


k 1 \J ei x hi x sin 8 1 cos <f> l = k 2 \/ e 2x ^ 2 x sin 6 2 cos </> 2 

(29) 

and 




k i \Jeiyhiy sin sin 4>i = n 2 y/ e 2y h 2y sin 0 2 sin (f) 2 

(30) 

where = 

and k 2 = u^/fi 2 e 2 . For a perfectly matched medium, 

we choose €1 

= e 2 , Mi = M 2 i e x = h x and e y = h y . Equations (29) and (30) 

become 

e lx sin 6\ cos 4 > i = e 2l sin 0 2 cos 0 2 

(31) 

and 

ei y sin 81 sin </>i = e 2y sin 0 2 sin (f> 2 . 

(32) 
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If we now choose c ix = e 2l and e iy = e 2y , then = 0 2 , = (j) 2 and we 

can show that both f? TE = 0 and R™ = 0 for all angles of incidence and all 
frequencies. 

If region 1 is a vacuum, then /t = /i 0 , e = c 0 , and 

(^lx i ^ ly > Cli , /llx j hiy , ) ( 1 1 1 i 1 1 1 1 1 1 1 ) ■ 

In order to have a lossy region 2 with no reflections at the region 1/region 2 
interface, we choose 


(c 2 x> C 2 y, ^2z 1 h-2 x j h 2 yi Al 2 z ) = (l,l,s 2 ,l,l,s 2 ) (34) 

where s 2 is a complex number. In this case, 

kh = k 2x = n 0 s\n6cos<f) (35) 

ki y = k 2y = K 0 smO siiuj) (36) 

A. i z = /c 0 cos 9 (37) 

k 2z = k 0 s 2 cos 9 (38) 


where k q = w^/poCo- If s 2 = s 2 + ts 2 , the wave will attenuate in the 2 
direction. This kind of interface is useful for building material ABCs in a 
FDTD simulation. 


5. Modified Equations in the Time Domain 

For the general case of a matched medium, we let e x = h x = s x , e y 
h y = s y and e z = h z = s z . Then, V e = V h = x±£ + 

Equation (1), we write the curl as 


In 


10 1 3 , 10 . 

V e x E = —7 —x x E H —y x E H —2 x E. 

Sx ox Sy ay s z az 


(39) 


Then, defining H 3ji , H- , and in terms of the components of Equation 
(39), we let 

iu>fiR lx — — —x x E (40) 

S C/ 

1 d 

*W/X H s = — — y x E (41) 

s y oy 
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and 


( 42 ) 


iu>H H 3 = — — — z x E 
s z dz 

where H = H, a 4- H Ju + H ai . Similarly, we can write Equation (2) as 


-iu;eE Sx 


Id. 

~— x x H 
s x ox 


(43) 


. „ Id. TT 

-ttoeE s = — — y x H 
Sy dy 

and 

. „ 1 d . „ 

~iu>e E, = — — z x H. 
‘ s 2 dz 


(44) 


(45) 


where E = E ai + E Sy -t-E s , . Note that H 5i , E ai , i — x, y , z are two-component 
vectors. 

We now let s x = l + icr x /u>e, s y = l+icr y /u)e and s z = 1 + io z /u;e. Writing 
Equations (40) - (42) and (43) - (45) in the time domain, we have 


A* 


d 

dt 



d - „ 

—z-x x E 
ox 


(46) 


and 


5H 3 cr y p d . 

"-af + -r H - = ~Ty v x E 

8H, ^ = a 

M dt e s ‘ dz 




aE, a 

e— ^ + (t v Ej = — y x H 
dt v 9 dy y 


^ + oqE, = ixH. 
at • az 


(47) 

(48) 

(49) 

(50) 

(51) 


Equations (46) - (51) described 3-D wave propagation in a perfectly 
matched medium. The wave propagation phenomenon described by these 
equations is very similar to that described by Maxwell’s equations with the 
exception that attenuation may be controlled through the o x , o y and o z vari- 
ables. The FDTD implementation of these equations on a Yee FDTD grid is 
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straightforward. Absorbing boundaries at the edges of the simulation region 
may be created by choosing appropriate values of a x , a y and o z . Equations 
(46) - (51) may be seen to include Berenger’s equations [berengerl] as a 
subset for the 2-D TE or TM case. 

The above equations involve 12 components of electromagnetic fields. For 
a free-space/lossy medium interface, a scheme may be devised using only 10 
field components for the 3-D case, and only 3 components for the 2-D case. 
However, this is achieved at the loss of SIMD operation on a parallel machine. 


6. Computer Simulation Results 

In order to demonstrate the new method, a 3-D orthogonal grid FDTD 
algorithm was developed based on Equations (46) - (51). The FDTD algo- 
rithm was implemented as a SIMD code on the Thinking Machines Corpora- 
tion Connection Machine CM-5. The algorithm operates very efficiently on 
the CM-5 because the FDTD stencil operations that need to be computed at 
each node involve only nearest-neighbor interactions. The communication op- 
erations resulting from the nearest-neighbor interactions are at a minimum 
cost since the neighboring processors are for the most part at the bottom 
of the fat-tree communication network, where communication bandwidth is 
maximum. 

To validate our 3-D FDTD algorithm, we solved a simple problem of 
computing the field radiated from an infinitesimal electric dipole in free space. 
An analytic solution was also computed in the frequency domain for many 
excitation frequencies. The frequency domain solution was then multiplied by 
the spectrum of FDTD source pulse and inverse Fourier transformed to yield 
a time-domain analytic solution for comparison with the FDTD solution. 

The FDTD solution was solved in a cubic region of dimension 


(N x , N y ,N z ) = (128,128,32) 


grid points. The grid parameters chosen were Ax = Ay — Az = 2.5mm, 
At — 4.5ps and N t = 512 time steps were computed. 

The infinitesimal electric dipole was simulated by exciting the E y field in 
a single grid cell with the source pulse 


j,m = 


i 

AxAyAz 


[4((/t) 3 - (1/t) 3 ] e~ ,,T 


(52) 


where r = 1/47 r/ 0 and a value of f 0 — 1.0GHz was chosen. The dipole 
source was located at grid location (n x , n y , n z ) = (91,64,16). The E x and 
E y fields were obtained by sampling the fields at grid location {n x ,n y) n z ) = 
(37,91,16). 
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The absorbing boundaries used for the FDTD simulation consisted of 
planar layers of thickness 8 grid points on all surfaces. Along the borders 
parallel to x axis, the value of a x was specified, while a y and cr. were specified 
on the borders parallel to the y and 2 axis, respectively. The conductivity 
values were chosen with a parabolic taper decreasing from the maximum value 
towards the center of the grid such that the reflection coefficient at normal 
incidence was Rq = .0001. 

The E x field computed using both the analytic formulation and the FDTD 
algorithm are overlaid in Figure 2. The curves due to the analytic and nu- 
merical solutions are barely distinguishable, indicating excellent agreement. 
Similarly, the E y field due to the analytic and numerical solutions are over- 
laid in Figure 3. Again, we see excellent agreement. Any difference between 
the analytic and numerical solutions in Figures 2 and 3 may be attributed 
to modeling errors such as the finite size of the dipole source and the dis- 
crete approximation of Maxwell’s equations in addition to reflections due to 
imperfections in the absorbing boundaries. 

The CM-5 machine used to solve the FDTD problem is located at the 
National Center for Supercomputing Applications (NCSA) at the University 
of Illinois. The program was written in CM Fortran and compiled using CMF 
version 2.1. The CM-5 at the NCSA has 512 nodes with vector units. CPU 
times were determined by running the problem on 32, 64, 128 and 256 node 
partitions. For this problem, a total of 0.5 million unknown field quantities 
(128 x 128 x 32 grid) were determined for 512 time steps. The CPU times 
are shown in Table 1. 

7. Conclusions 

A modified set of Maxwell’s equations have been introduced using com- 
plex coordinate stretching factors along the three cartesian coordinate axis. 
This modification introduces additional degrees of freedom in Maxwell’s equa- 
tions such that absorbing boundaries may be specified with zero reflection 
coefficient at all frequencies and all angles of incidence. The formulation 
was shown to be related to the perfectly matched layer that was recently 
derived by Berenger for 2-D wave propagation. A 3-D FDTD algorithm was 
developed from the modified Maxwell’s equations that uses the reflectionless 
absorbing interface property to implement radiation boundary conditions at 
the edges of the FDTD grid. The accuracy of the algorithm was validated by 
computing the field radiated from an infinitesimal electric dipole and com- 
paring against a known analytical expression. The FDTD algorithm was 
implemented on the Connection Machine CM-5 and timing results were pre- 
sented. This breakthrough in absorbing material boundary conditions allows 
EM scattering to be computed very efficiently on SIMD parallel computers. 


28 



Acknowledgement 

The authors wish to thank J. P. Berenger for sending us a preprint of his 

work and to A. Taflove for bringing to our attention J. P. Berenger’s work. 

References 

[1] K. S. Yee, “Numerical solution of initial boundary value problems in- 
volving Maxwell’s equations in isotropic media,” IEEE TYans. Antennas 
Propagat., vol. AP-14, pp. 302-307, 1966. 

[2] A. Taflove, “Review of the formulation and applications of the finite- 
difference time-domain method for numerical modeling of electromag- 
netic wave interactions with arbitrary structures,” Wave Motion, vol. 10, 
pp. 547-582, 1988. 

[3] W. C. Chew, Waves and Fields in Inhomogeneous Media. New York: 
Van Nostrand, 1990. 

[4] W. H. Weedon, W. C. Chew, and C. M. Rappaport, “Computationally 
efficient FDTD simulation of open-region scattering problems on the con- 
nection machine CM-5,” in IEEE Antennas and Propagation Society In- 
ternational Symposium Digest , (Seattle, WA), June 19-24, 1994. 

[5] B. Engquist and A. Majda, “Absorbing boundary conditions for the nu- 
merical simulation of waves,” Math. Computation , vol. 31, pp. 629-651, 
1977. 

[6] G. Mur, “Absorbing boundary conditions for the finite-difference approxi- 
mation of the time-domain electromagnetic field equations,” IEEE Trans. 
Electromag. Compat., vol. EMC-23, pp. 377-382, 1981. 

[7] Z. P. Liao, H. L. Wong, B. P. Yang, and Y. F. Yuan, “A transmit- 
ting boundary for transient wave analysis,” Scientia Sinica. (Series A), 
vol. 27, no. 10, pp. 1063-1076, 1984. 

[8] R. L. Higdon, “Numerical absorbing boundary conditions for the wave 
equation,” Math. Comput., vol. 49, pp. 65-90, 1987. 

[9] I. Katz, D. Parks, A. Wilson, M. Rotenberg, and J. Harren, “Non- 
reflective free space boundary conditions for SGEMP codes,” Systems, 
Science and Software, vol. SSS-R-76-2934, 1976. 

[10] R. Holland and J. W. Williams, “Total-field versus scattered-field finite- 
difference codes: A comparative assessment,” IEEE Trans. Nuclear Sci., 
vol. NS-30, pp. 4583-4588, 1983. 

[11] J.-P. Berenger in Actes du Collogue CEM, (Tregastel, France), 1983. 

[12] C. Cerjan, D. Kosloff, R. Kosloff, and M. Reshef, “A nonreflecting bound- 
ary condition for discrete acoustic and elastic wave equations,” Geo- 
physics, vol. 50, pp. 705-708, 1985. 

[13] R. Kosloff and D. Kosloff, “Absorbing boundaries for wave propagation 


29 



problems,” J. Computational Physics , vol. 63, pp. 363-376, 1986. 

[14] C. M. Rappaport and L. Bahrmasel, “An absorbing boundary condition 
based on anechoic absorber for EM scattering computation,” J. Electro- 
mag. Waves Appl . , vol. 6, no. 12, pp. 1621-1634, 1992. 

[15] J.-P. Berenger, “A perfectly matched layer for the absorption of electro- 
magnetic waves,” J. Computational Physics, pp. 10-41, March 1994. 


30 






Figure 3. Analytic and numerical FDTD solution overlaid for the 
E y field resulting from an infinitesimal electric dipole. 


Nodes 

CPU sec (Run 1, Run 2, Run 3, Avg.) 

32 

50.5, 50.2, 50.6; 50.4 

64 

29.9, 30.0, 30.0; 30.0 

128 

17.9, 18.4, 18.4; 18.2 

256 

12.4, 13.2, 12.7; 12.8 


Table 1 . CPU times for FDTD Problem on CM-5. 
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CHAPTER 3 


FAST MULTIPOLE METHOD SOLUTION 
USING PARAMETRIC GEOMETRY 


1. Introduction 

Practical electromagnetic problems are often three-dimensional (3-D) and 
involve arbitrary geometry. In the case when an object of curvature is of in- 
terest, the use of flat facets creates unnecessary artificial discretization in the 
solution. Recently, many researchers have been investigating the use of curved 
patches [1-3]. Wilkes and Cha [2] extended the flat triangular patch moment 
method solution developed by Rao, Wilton, and Glisson [4] to the curved 
triangular patch. This part presents a technique for computing the elec- 
tromagnetic radiation and scattering from 3-D conducting bodies of general 
shape. The arbitrary surface is described by dividing it into a number of con- 
nected patches which are mathematically described as parametric quadratic 
surfaces. The electric field integral equation (EFIE) is solved by standard 
MOM technique with specifically designed basic functions for subdomains 
which now contain surface curvature. 

2. Parametric Quadratic Surface Description 

An arbitrary surface with curvature can generally be represented using 
two parameters: tqand u 2 . The surface is then described by the equation 
t(ui,U 2 ). A differential tangent vector is given by 

dr dr 

dr = - — dui + - — du 2 (1) 

UU\ UU2 

Any vector tangential to the surface can be written as a linear combination of 
A^and Generally, the surface is composed of curvilinear patches. These 
patches are smoothly connected to each other at common boundaries. In 
this research, r(ui,u 2 ) is a second-order polynomial of rqand u 2 . Thus, nine 
points on each patch should be known with respect to the origin of a known 
coordinate system. The vector from the origin to the surface of the p-th patch 
may be written as 

uK , u 2 ) = 22 c »ut u i n ~ lu r 1 ( 2 ) 

m=l n^l 
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for tii from 0 to 1, w 2 from 0 to 1, where Cm\ are related to the known nine 
points coordinates, and can be solved from the following linear equations 


r p(0) 0) 

= r n 

r p (0, 0.5) 

= r 12 

r P (0, 1) 

= r i.i 

r p (0.5,0) 

= r 2l 

■ p (0.5,0.5) 

= r 22 

r P (0.5, 1) 

= r 2 3 

r p (l,0) 

= r 31 

r p (l, 0.5) 

— r 32 

r p(l> 1) 

= r 33 


(3) 


3. MOM Solutions 

For conducting objects, the EFIE is given by 



J(r')V 


AkR 


R 


-clS' 


47 xi 

hi] 


r-E'(r) 


(4) 


for r on surface S, where t is any unit tangent vector on S, E' is an impressed 
field which excites system, and it is isolated in an impressed field region, or 
an incident plane wave, and R = |r — r'|. 

The EFIE for the unknown electric current on the conducting surface 
induced by an incident wave is solved using standard Method of Moments 
(MOM) technique. Each patch is segmented into quadrilateral cells (in para- 
metric space, these cells are rectangular). The unknown current J(r) is first 
expanded in an appropriately chosen set of basis functions {j„ lm } and {j U2m } 

N 

j ( r )= X X a -"W r ) (5) 

q — tx i ,U2 n=l 

where a Qn are the unknown expansion coefficients. The basis functions used 
lie on the surface of a pair of curved quadrilateral cells is defined as 


ju lm = ( Ul )P U2m (u 2 ) ar 


<7(wi,'«2) 


du\ 


(G) 


jQll rr> ( „. \ u ( \ 

Jbl2m “ \ ” 77. 7~ \ ^ u 2m (^) ^U2lm V^l) 


<7(«i ,«a) 


du 7 


(7) 
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where T u is a triangular function, P u is a pulse function, and 


dr dr 
9 '* dui duj 


( 3 ) 


is known as the metric tensor. In differential geometry, it is often called the 
first principal form. The determinant of is 


9 — 9n 922 ~ 9n- (9) 

It can be proved that the divergences of basis functions are finite, and thus 
there are no artificial line charges on the patch. Furthermore, as the patch 
dimension becomes small compared to the radius of curvature, this basis 
function approaches the rooftop function for flat patches. Since currents flow 
across the junctions of connected patches, an additional set of overlap modes 
has been added to the usual expansions [5]. 

The EFIE for J(r) is discretized by substituting the above expansion in 
terms of unknowns a an . Then, rather than forcing the EFIE to be satisfied 
for r on surface 5, it is tested with integration along a line from the center of 
a cell to the center of adjacent cell (line matching), and with the same basis 
functions (Galerkins method). 

Usually the most important and difficult to evaluate terms in MOM ma- 
trix are the self impedance terms since the Green function contains an in- 
tegrate singularity at |r — r'|. The procedure used to treat these singulari- 
ties is to add and subtract a singular term from the integrals which can be 
integrated analytically and also renders the integrals well behaved so that 
standard numerical integration methods can be applied. The scalar potential 
integral can be written as 



e ikR 

du\ du? 

R 1 2 




du[du 2 + 



du[du 2 

Rq 


(10) 


The function Ro, which should have the same behavior as near the singularity, 
is developed by using a Taylor series approximation of r' near r. Thus R 0 is 
defined as 


Ro = \J9u (w'l -U\) 2 + 922 («2 - u 2 ) 2 + 2tfi2 (u'i - «i) («i ~ «i) (10a) 


Then the integral of l//? 0 can be found analytically. 


4. The Fast Multipole Algorithm 

As can be seen, the scattering by arbitrarily shape of conductor can be 
converted to finding the solutions of an integral equation where the unknown 
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function is the induced current distribution. The integral equation can be 
converted to a matrix equation by the method of moments (MOM). The 
resultant matrix equation is then solved by Gaussian elimination, which re- 
quires 0(N 3 ) floating-operations if Gaussian elimination is used to solve N 
linear equations, or 0(N 2 ) operations per iteration if the conjugate gradient 
(CG) method is used. 

The fast multipole method (FMM) [6,7] is designed to speed up the 
matrix-vector multiply in the CG method when it is used to solve the matrix 
equation. The idea is first to divide the subscatters into groups. Then, addi- 
tion theorem [8] is used to translate the scattered field of different scattering 
centers within a group into a single center. Hence, the number of scattering 
centers is reduced. Similarly, for each group, the field scattered by all the 
other group centers can be first “received” by the group center, and then 
“redistributed” to the subscatterers belonging to the group. 

The addition theorem [8,9] has the form 


0 tfc|x-fd| 00 

— T IT = ikJ2(-l) , {2l + l)j,{kd)h \ l \kx)P l (d-x) (11) 

x + d l U 


where ji is a spherical Bessel function of the first kind, /ij l * is a spherical 
Hankel function of the first kind, Pi is a Legendre polynomial, and d < x. 
Substituting the elementary identity [10, p. 410] 


4iri l ji{kd)P t {d ■ x) = J d 2 ke ik d P l {k ■ x) (12) 

into Equation (1) yields 

ifc|x+d| jU r " °° 

f = — / d 2 ke ikd V i l (2l + l)h\ l] (kx)P,{k ■ x) (13) 

l x + d l 4 W ^ 

~ J d 2 k elk d TL(c°s9). 

(14) 


We have truncated the sum of infinite series, where f d 2 k represents the 
integrals over the unit sphere, and 


L 

T] (cos 6) = + l)h\ 1) (kx)P l {cos6). (15) 

1=0 
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Letting r ; and r ; be the field point and source point, respectively, vve have 
= Tj - r, = Tj - r m + r m - r' m + r' m - r, 

= r jm + r mm' — r im'- 


Thus, the scalar Green’s function can be rewritten as 


j i ? i* /* * a 

= — / d 2 ke' k ^~ r ^ cw(r mm . • k) 

Tjx 47T J 


where 


• k) = y~V(2Z + l)/q (1 \fcr mm .)P,(rw • fc). 


( 16 ) 


(17) 


(18) 


(=0 


The integration in (17) will be evaluated by Gaussian quadratures with k = 
2L 2 points. 

For conducting objects, the electric field integral equation (EFIE) from 
(4) is written alternatively as 


where 

G(r,r') 

Using the scalar Green’s function (7), we get 


J(r')dS' = t— £ • E‘(r) 

K7] 

(19) 

i- A vv ' 

e ikrj, 

(20) 

fc 2 

Anrji 


G(r i ,r 1 ) = J d 2 k(I - fcfc)e ,k(rjm rim,) a mm >(f mm >-k). (21) 

Applying MOM to the EFIE with basis function j; and testing function tj, 
we transform the integral equation to matrix equation 


N 




t — 1 


where 


Aji = [ dStj(r) • [ dS'G(r,r') •j.(r') 

F ^// SMr)E ‘ (r) - 


( 22 ) 

(23) 

(14) 
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For non-nearby group pair (m,m'), Substituting Equation (21) into (23) 


yields 

Aji = ~ y d 2 kV fmj (k) • cw (k ■ f mm .)V: mli (k) 

(15) 

where 

Vsm'i(k ) = J ds'e ik rim ' [I — kk] ■ ji(r,- m >) 

(16) 


V fmj{k) = f ds'e ik r j m ' [I - kk} ■ t } (r im ) 

(17) 


When r ; and tj are same functions, this is Galerkin’s method. 

Before solving the matrix equation by CG, we need to calculate some ma- 
trix elements. First, we divide the N basis functions into G localized groups, 
labeled by an index m, each supporting about M — N/G basis functions. 
Second, for nearby group pairs (m,m'), we calculate the matrix elements by 
direct numerical computation. Third, we compute V sm ;(fc) and V/ m j(k) for 

K directions of k. Finally, we compute a mm <(k ■ r mm > ) for each non-nearby 
group pair (m,m'). 

4.1. Algorithm for Matrix-Vector Multiplication by FMM 

1 . 

s„(fc) = £ v; mi (fc)a,. (18) 

i€Gm 

This step requires O(KN) operations. 

2 . 

g m{k) = ^a mm .(fc • r mm i)S' m (k) . (19) 

m ! 

This step requires 0(kG(G — B )) operations, where B is the average 
nearby groups. 

3. 

N , 

AjiOi = A J' U ' + / • gm(fc), j € G m . (20) 

i=l m' ieG m , J 

The first term is the contribution from nearby groups (including itself), 
and the second term is the far interaction calculated by FMM. This step 
requires 0[BGM 2 ) + 0(KN) operations. 

To ensure that the Green’s function to converge to the desired accuracy, 
wc choose 

L = kD + ln(7r + kD) (21) 
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where D is the maximum diameter of a group size. Thus L is proportional 
to the size D, and K is proportional to the surface area of the group. Since 
the unknowns in each group (M) is proportional to the surface area too, 
then I< ~ M. The computation in the matrix-vector multiply requires 
T = C\NG + C 2 N 2 /G operations, where C\ and C 2 arc machine and imple- 
mentation dependent. The total operation count is minimized by choosing 
G = s/C^NjG. Therefore, T = 2 v /QC^V 1 - 5 . 

5. Ray Propagatin Fast Multipole Algorithm (RPFMA) 

In the above, ■ f mm >) translates the field radiated from a trans- 

mitting group in some dirction k into the received component in the same 
direction at a receiving group. We expect the interation to be strongest for 
fields radiated along the line joining the transmitting and receiving groups. 

Thus, we can ignore some k's when a mm >(k • r mm <) is too small compared to 
the maximum of ot mm >(k • r mm <). To take full advantage of this idea, we use 
a window function in the calculation of a„ un >(k ■ r mm ») (Equation (18) is a 
Fourier series computed with a square window in the variable /). Thus, de- 
noting the window function Wj, the elements of a mm > (k ■ r mm < ) are calculated 
as 

L 

a mm '(k ■ r mm >) = ^2 Wii l (2l + l)h\ l) (kr mm >)Pi{k ■ f mm .). (22) 
(=0 

Using this ray propagation idea, the cost of step 2 is reduced from KG 2 to 
KqG 2 , where Kq is independent of group size. The total operations of a 
RPFMA matrix-vector multiplication become 

T = C Z G 2 + C 2 N 2 /G. (23) 

It is minimized when G ~ N 2 / 3 , Therefore T ~ 7V 4 ^ 3 . 

6. Numerical Results 

Numerical implementations are verified by comparing the results with 
some known ones in published papers for a conducting plate with different 
shape (square, rectangular, triangular, disk, etc.). The program is tested on 
the case of a conducting sphere since it is one of the very few cases for which 
accurate and comprehensive data are available. Compared with Mie series, 
the bistatic Radar Cross Section (RCS) using curved quadrilateral patches is 
as good as using curved triangular patches [2]. The latter uses more input 
geometry data. 
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The RCS of a 14 inches NASA Almond is calculated at / = 2 GHz. The 
results agree with the experimental and numerical results given by Newman 
[5], 

The fast multiple method has also been implemented. Figure 2a shows 
the comparison of the CPU time versus the number of unknowns for the 
fast multipole method (FMM), the ray-propagation fast multipole algorithm 
(RPFMA), standard CG, and LU decomposition for calculating the bistatic 
RCS of a square metallic plate at normal incidence. The plate is discretized 
with 10 unknowns per wavelength. It is seen that the FMM and RPFMA 
both outperform the standard CG in terms of matrix fill and matrix solve. 
The FMM and RPFMA also require less memory, and hence, can solve a 
larger problem on a small computer. The simulation is performed on a 
SUN-SPARC-2 with 64 MB of RAM. Figure 2b shows a similar plot for a 
conducting sphere. 

Figure 3 shows the validation of the FMM with the Mie series solution for 
the bistatic RCS of a conducting sphere of radius 1 m and at frequency of 0.42 
GHz for the parallel polorization. Ten unknowns arc used per wavelength. 

Figure 4 shows the RCS of a wedge cylinder with plate extension having 
a total length of 3.73 m, and width of 2 m at 0.3 GHz. The plate is in the 
xy plane while the wave is incident at 80° from normal. The computation 
is done with LU decompostion requiring 2.5 hr of CPU time on a SUN- 
SPARC-2. Some points computed with FMM are shown, but it took FMM 
0.5 hr/point in these computations. 

Figure 5 shows the RCS of a one meter long NASA almond at 2.5 GHz 
in the xy plane with 0 — 90°. Five unknowns are used per wavelength. The 
calculation is done with LU decomposition on a SUN-SPARC-10 with 128 MB 
RAM, and it consumes about 24 hr of CPU time. Some points computed with 
FMM are shown. These points took 3 hr /point to compute. 

In summary, we describe the arbitrary shape of conducting body using 
curved patch. The EFIE is discretized by MOM with rooftop basis func- 
tions. Numerical results agree very well with those of existing studies. This 
approach needs less unknowns since it describes the object more accurately. 

We have compared various method of solving the resultant dense ma- 
trix equation: using LU decomposition, fast multipole method, and the ray- 
propagation fast multipole algorithm. For one bistatic RCS involving one 
incident angle, FMM and RPFMA outperform LU decomposition when the 
number of unknowns is more than 2,000. RPFMA only performs marginally 
better than FMM because the problem size we have been able to run is not 
large enough. 

For monostatic RCS involving many incident angles, FMM and RPFMA 
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does not show an advantage over LU decomposition. To have an advantage, 
the number of iterations in FMM and RPFMA will have to be reduced, and 
their computational complexity has to be further reduced. 
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Figure 1 . Curved quadrilateral patch defined by nine points. 
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Figure 2. (a) Comparison of the CPU time versus the number of 
unknowns for FMM, RPFMA, standard CG and LU decomposition 
for a square metallic plate, (b) The same as (a) but for a metallic 
sphere. 
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a=1m sphere, f=0.42GHz, 112 patches, 5600 unkws. Parallel Polarization 



Figure 3. Validation of FMM against the Mic series solution of the 
bistatic RCS of a conducting sphere of radius 1 m at 0.42 GHz. Ten 
unknowns arc used per wavelength. 


Wedge Cylinder with Plate Extension 



Figure 4. RCS of a wedge cylinder with plate extension from a 
NASA test case. The results are computed with LU decomposition, 
and partially with FMM. 
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RCS (dB Square Meters) 


1m NASA Almond, f=2.5GHz, 110 Patches, 3297 Unkws 



Figure 5. RCS of a one-meter long NASA almond at 2.5 GHz in 
the xy plane with 6 = 90°. Five unknowns are used per wavelength. 
The results arc computed with LU decomposition, and partially with 
FMM. 
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